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Abstract. This paper analyzes different online algorithms for the problem of 
assigning weights to edges in a fully-connected bipartite graph that minimizes 
the overall cost while satisfying constraints. Edges in this graph may disappear 
and reappear over time. Performance of these algorithms is measured using 
simulations. This paper also attempts to derandomize the randomized online 
algorithm for this problem. 



1 Scope 

This paper aims to analyze online algorithms[6]for dynamically evolving graphs |12H3|14j . 
The input consists of a bipartite graph G — (V, E) with two types of nodes - consumers 
C and producers P (V — C U P) and edges E where {e^ € E : i £ C, j € P} and 
attribute arrays associated with nodes a Vi (t) = [a Vil , a Vi2 , a Vi3 ,•••], Vi € V and edges 
a eij! i a ei j2 ; a eij 3 ; ■ ■ ' ] i e ij € E whose values may change over time. 

A sequence of online service requests- R = R(t)R(t + l)R(t + 2) • • ■ is received 
as input that consist of one or more consumer demands and edge failures. Service 
requests can contain more than one demands corresponding to consumers R).(t),k G C 
- R(t) = UkRk(t),k = \C\ (corresponding to a multi-tape Turing machine). Here, t is 
the instance when service request R(t) is received and is an increasing function of time. 
Demands act by either removing / adding edges or modifying edge attributes. 

The objective is to minimize the overall cost of weight assignments such that it is 
not much worse than cost of optimal offline. 

/(oe« (t),R(t)) * ey(t) < a * OPT(t), Vi e T (1) 

i£C,j£P 

with constraints for consumers, 

f(a eij (t),R(t))=f(a Vi (t))yi£C (2) 

and producers , 

f(a eij (t),R(t)) = f(a Vj (t)),Vj e P (3) 
The dynamic nature of the edges is characterized by the following - 

(+\ / 1 */ there is an edge between i € C and j € P at instance t , , 
lJ \ otherwise 

This paper focuses on the optimal offline strategy and tries to find competitive 
online algorithms for this problem. This papers tries to use randomization to make 
the online algorithms more competitive. This papers also attempts to derandomize the 
randomized online algorithm. 
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2 Problem Definition 



This paper considers a subset of the problem specified in section [TJ Given a complete 
bipartite graph G — (V, E) where, V is a finite set of nodes which consists of consumers 
Ci,i £ C with indegree zero and producers Vj-,3 £ P with outdegree zero such that, 
V = C Li P and edges £ E where, \E\ — \ V\ 2 with distances d^ between them. 

Problem 1. Online service requests R = R1R2 ■ ■ ■ R ni ,ni — \C\ are received as input 
such that each service request has a unique demand Rk,k £ C (L)Rc\k = %,R% 7^ 
i?j,Vi,j £ C). These demands act by either increasing the edge weights tuys or 
removing an edge by setting d^ = 00. Edge weights WijS assigned to the edges cannot 
be reduced apart from the case where an edge goes down. In this case, weights are set 
to zero - Wij = OVe^ = 0. 

Find an a-competitive online algorithm for satisfying the service requests R that 
minimizes the sum of weights: 

(t) * d i:j (t) * eij (t)<a* OPT{t) , Wt £ T (5) 

where, c is a constant and OPT(t) is the output of the optimal offline algorithm at 
instance t. Such that, 

Wij(t) = Ri,Vi € C (6) 

Constraint in equation [6] guarantees that demands generated by the consumer until 
now are satisfied. 

wyit) < M jt Vj e P (7) 

ieC,jeP 

Constraint in equation [7] guarantees limited capacities for the producers. 

Problem 2. Consider a version of the problem [2] where edge distances dij can change 
as specified by the service requests. 

Problem 3. Consider a version of the problem [2] where capacities associated with the 
producers Mj can change as specified by the service request. 

Problem 4- Consider a version of the problem [2] where new consumers / producers can 
be added to the graph and some of the existing consumers / producers can go down. 

3 Motivation 

The VMs running in a distributed system can be considered as the consumer and 
the data-centers as the producers of storage. The capacity of the producers can be 
considered as an attributes of the producer. The average time per I/O operation or 
latency between a VM and a data-center can be considered as the distance of the edge. 
And the sequence of I/O requests generated by the consumers can be considered at the 
service request in the problem[2j And the edge failures are equivalent to the data-center 
or the VM being down. 



The objective of this paper is to find a scheme for allocating these requests so that 
the overall cost of I/O operations at any instant is not more than a times worse than 
the optimal cost OPT that can be achieved when all the service requests are know at 
the beginning. 

As more and more data moves to the cloud every day, it becomes important to analyze 
distributed resource scheduling schemes for better performance of VMs with respect to 
I/O read and write operations. To make the storage management transparent to the 
users of the Cloud platform it is important to have automated storage management 
schemes running on the cloud platform that make the best use of the available storage 
while guaranteeing good performance to the users. Aggregating the available storage 
across the distributed system into resource pools and distributing them prevents the 
storage from being wasted. 

4 Introduction 

One of the well known Distributed resource scheduling schemes is used in VMware's 
virtualization framework |33| - Virtual Infrastructure using VirtualCenter - a centralized 
distributed system and, recently in VSphere - a cloud OS. Both these systems work by 
pooling the available storage into resource pools in a tree-like data structure. This paper 
[15j analyzes the various distributed resource allocation techniques used in distributed 
systems. 

We consider the theoretical aspects of this problem of allocating storage optimally 
to the VMs. Some of the points to be noted are as follows- 

• Storage associated with a VM is a non decreasing function of time. 

• Previously allocated storage cannot be moved to another data-center as this can 
be very time-consuming for large storage sizes. 

• Some VMs may consume storage at a faster rate and can starve other VMs. 

• To reduce the complexity we only consider a graph with fixed number of nodes; 
new VMs or data-centers are not added dynamically. 

Similar problems involving min-flow [5], online matching |3I5) . dynamic assignment 
[llj . LP techniques |8|9j . bipartite network flow [7] and combinatorial optimization 
|21j . distributed resource allocation [20119110] have been studied in earlier. Hungarian 
algorithm[T] 1S one °f the earliest known method for solving the assignment problem 
that uses the primal-dual method. It is to be noted that the problem studied in this 
paper does not require a matching. 

Fairness of resource allocation |16|17j and dynamic load balancing |24|23)31] issues 
related to these problems have also been studied before. [T5] studies how randomization 
can be used to improve the competitiveness of online algorithms. |22j throws light on 
how such schemes could be adapted to large scale cloud-computing platforms. 

5 Offline Algorithms 

Optimal offline algorithm for this problem is a Linear Program. It seems trivial at first 
to iterate through the demands and allocate weights corresponding to demands, on the 
edges with least distance dij that are connected to producers with available capacity. 
However, the order in which demands are considered will affect the cost of output. The 



optimal algorithm has to look at all the demands simultaneously and then look at all 
the available edges where the demands can be allocated. 

Due to this, one has to consider all possible assignments of demands to edges in order 
to find the least cost assignment. This paper uses LP for solving the offline version due 
to ready availability of LP code that is used for simulations in section [7] 

5.1 LP 

The LP formulation for this problem is as follows - 
Objective function: 

Minimize : dij * Wij, Wij >= 0, dij > (8) 

ieC.jeP 

As the demands are non-negative the weights are also non-negative. Edges that fail 
[4] have their corresponding rfy's set to infinity so that, they arc not selected. 

Constraints: 

w lJ >R l ,WieC (9) 

Equation in [9] suggests the total demand of a consumer i e C should be met. 

Y Wij < Mj =>• - ^ > -Mj, Vj G P (10) 

iecjeP iec,jeP 



Equation in 10 suggests, the capacities of the producers cannot not be exceeded. 



Lemma 1. LP formulation in 5.1 produces a valid assignment of weights Wij on edges 
eij corresponding to the demands R. 

Proof. Equation [9] gurantees that the total demand generated by consumer i G C is 
satisfied. Equation [T0| ensures that the capacities of producers j G P are not exceeded. 
By definition [2] this is a valid assignment of weight on edges. □ 



Lemma 2. LP formulation in \5. 1\ produces the optimal assignment of weights Wij on 
edges e^ corresponding to the demands in R. 

Proof. Lemma 1 ensures that this LP produces a valid solution. Since the objective|8]is 
a minimization function and fractional weights are allowed, it follows that the solution 
produced by LP is the optimal solution. □ 



For edge failures, the LP formulation |5.1| has to be modified by removing the failed 
edges, adding constraints for the current weight assignments and adding the demands. 



6 Online Algorithms 



Online algorithms are used for solving problems where the entire input is not known 
and partial decisions have to be made at each step. The input is received incrementally. 



Competitive ratio is used to measure the performance of online algorithm as compared 
to the optimal offline algorithm that knows the entire input. 

A a-competitive online algorithm ALG is defined as follows with respect to an 
optimum offline algorithm OPT - 

cost{ALG{I)) < a * cost(OPT{I)) + /3 (f 1) 

In the definition of online algorithm in equation |1I| a is called the competitive ratio 
and /? can be considered as the startup cost of the algorithm. 

One of the widely used online algorithm is for the k-server problem p[5 26 27 28 130129] 
where k-servers have to service n clients and the order of the service requests from the 
clients is not known at the beginning. The objective of this problem is to find the 
shortest path to serve the clients. 



6.1 Greedy Algorithm 

This algorithm looks for the best available edge - min^. . This leads to a myopic 
behavior as the algorithm always looks for local minima. For example, consider a graph 
consisting of two consumers c\ and ci with demands R\ and i?2 such that, R\ < i?2 and 
two producer pi and P2 with capacities Mi and M-i respectively such that M\ > 
and R\ = Mi. Let the edges be en, ei2, e2i and e22 such that dn = d21 = x and 
d 12 = d22 = x + 5. 

Demand R\ from consumer c\ arrives first and is allocated on the cheapest edge en. 
Since, R\ — M\ the demand di from consumer C2 that arrives next is allocated on the 
edge e22- The total cost is x * R\ + (x + 5) * Ri = x * (R\ + R2) + 5 * R2. The cost 
would have been {x + 5) * R\ + x * R2 — x * (Ri + R2) + 5 * R\ which is lesser than the 
cost of output produced by greedy since R2 < RI. 

It is observed that, 

Wija(l/dij),Vi G C,j e P 
=^ w i:j = k* (l/dy),Vi e C,j e P 

Substituting equation [12] in [lO] we get - 

fc*(lMi)<M,-,ViGP 

iec,jeP 

=> k< [1/ J2 (dii)]*Mj,VjeP 

i£C,j£P 

This paper aims to analyze an online algorithm that is more far-sighted. The Randomized- 
Greedy algorithm in the next section also considers the non-optimal edges to prevent 
getting stuck in local minima. 



(13) 



6.2 Randomized Greedy Algorithm 



This algorithm intends to be more far-sighted than the greedy algorithm. For this, 
it maintains a set of top k cheapest available edges, sorted by c£y. At each step we 



Algorithm 1 Randomized- Greedy 
S 4- FindTopAvail(k) 
ec <— Greedy(S) 
numlterations <— 
maxlterations <s— n 
ea ^— Greedy(S) 

while numlterations < maxlterations do 

numleratations 4— numlterations + 1 

eji <s— Random(S) 

if d(e R )/d(e G ) < f3 then 
return e# 

end if 
end while 
return ec 



consider and edge at random from this set and compare its cost of the edge selected 
by Greedy algorithm. If an edge meeting the condition d{en) / d{ec) < /3 is found then 
we select this edge. This procedure is repeated n number of times. If no such edge is 
found within n iterations then we simply return the edge selected by Greedy. 

In this algorithm f3 is called the sub-optimal penalty. When it is set to 1 then it 
becomes a greedy algorithm. As (3 increases the algorithm is allowed to select higher 
cost edges. The aim is to prevent the algorithm from getting stuck in a local optima. 
As it is seen in the simulation results [3j this algorithm performs better when the 
distribution of service requests is not uniform. 

Derandomization of Randomized- Greedy algorithm This papers aims to measure 
the performance of online algorithms using pairwise-independent random numbers as 
input in place of the real-world data. Recursive n-gram hashing is used to generate 
consumer demands and producer capacities. The randomized greedy algorithm is derandomized 
by running it multiple times and the output with the least cost is selected among the 
available outputs. Using this method requires higher processing power although, it 
improves the performance of Randomized algorithm. 

7 Simulations 

This paper simulates the online assignment problem using C++ - GCC— 4.3.020080428, 
LP solver lp_solvev.5.1.1.3 and shell scripts on Linux (Red Hat 4.3.0-8) platform. Shell 
script is used to generate input using built-in random number generator for edge 
distances, producer capacities, consumer demands, edge failures. The online greedy 
and randomized algorithms [T] are implemented in C++. The binary file containing 
these algorithms takes three parameters: type of algorithm, input file and output file. 

The outermost shell script is invoked as - 
evaluateDRSUsersl . sh $users $demands $resources $capacities $failures 

This script generates an input file for the online algorithms - 

no of consumers : 2 
no of producers : 2 



edge distances 

47 

17 

11 

2 

producer capacities 

26 

839 

consumer demands 

97 

78 

Number of edge failures: 1 
1 <- demand # 
3 <- edge # 

and for the offline LP solver - 

min: 47x1+17x2+11x3+2x4; <- objective function 
xl+x3<=26; <- producer constraint 
x2+x4<=839; 

xl+x2=97; <- consumer constraint 
x3+x4=78 ; 

This script then invokes the LP solver - 

./lp_solve model. lp » drsUsersOut.txt 

and the binary file containing the greedy and randomized algorithms - 

./onlineDRSAlgol.o greedy onlineIn.txt drsUsersDut.txt 

. /onlineDRSAlgol . o randomized onlineIn.txt drsUsersDut.txt 

Main function in C++ for solving the online version of the problem - 

int main(int argc, char* argv[]) { 
GetAlgo(argv[l] ) ; 
ReadlnputFile (argv [2] ) ; 
GetNumProducers () ; 
GetNumConsumers () ; 
GetEdgeDistO ; 
InitCEWeightsO ; 
InitEWeights () ; 
GetConsumerDemands () ; 
GetProducerCapacitiesO ; 
InitEdgeAvail () ; 
GetEdgeFailures () ; 
AllocateDemandsO ; 
Write0utputFile(argv[3] ) ; 
return ; 

} 



Performance of Offline Optimal algorithm 




Fig. 1. Cost vs Offline Optimal 




Fig. 2. Cost vs Online Greedy 



Performance of Online Randomized algorithm 
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Fig. 3. Cost vs Online Randomized 



The weights of edges are initialized to zero. The availabilities of edges are initialized 
to the capacity of the producers connected. For greedy algorithm, the program chooses 
the available edge with the least cost. For the randomized algorithm, the program 
first sorts the available edges in increasing order. Then picks an edge suggested by the 
srandQ and randQ function such that the cost of this edge is less than j3 times the best 
available edge. 

The weight of the edge returned - weight{e) , is increased by the value of demand 
and the availability avail(e) is decreased by the same value. If there are any edge 
failures at the current demand the program generates a new internal demand and sets 
avail (e) = weight(e) = for the edge that failed. The program writes the total cost 
along with the weight(e),Ve € E to the output file. STL map data-structures with 
double data-types are used for edge and node properties. STL 2-D vectors with double 
data-types are used for tracking the weights allocated on the edges for each consumer. 

For finding the optimal offline solution in the case of edge failures additional constraints 
are added to LP at each instance of edge failure, such that the currently allocated 
weights are not disturbed. This is done for each of the edge failures with the demand of 
the consumers at each stage being equal to their sum of demands until the edge failed. 

7.1 Analysis of Simulation Results 

In Fig. [7] as the number of users increases with the same number of resources, the cost 
of the optimal offline also increases. The number of incoming edges to any producer 
increases as the consumers increase. This increases the competition for the resources 
available at the producers. The optimal offline algorithm has to select costlier edges due 
to limited availability of edges, as the number of demands increase and this increases 
the overall cost. This is inline with Fig 4 in |33j where it is referred to the latency 
(which is the overall time required to complete I/O requests). 



Greedy algorithm Fig. [J] closely follows the optimal offline algorithm at each step of 
the input, greedy which produces output that has a cost comparable to the optimal 
solution and does not deviate much apart from the cases where higher demands arrive 
later as described in paragraph 2 in section [6. 1| 

Randomized algorithm does better than Greedy algorithm in cases where higher 
demands arrive later in the sequence. In this case, greedy algorithm fails badly as it 
does not have any cheaper edges left towards the end and has to select costlier edges for 
higher demands. However, Randomized algorithmic^. [3] does not cope well for graphs 
with higher number of nodes. Randomized algorithm pays the penalty of selecting a 
suboptimal edge to get an overall cost improvement; however due to large size of the 
demands it is not able to recover from this penalty and this causes a cascading effect. In 
Fig. 3 the randomized online algorithm shows the highest standard deviation because 
of the random nature of the selections. 

8 Conclusion 

This paper concludes that it is possible to produce a competitive online algorithm for 
this problem using randomization. The competitiveness of this algorithm suffers for 
higher number of nodes and edges in the graph. This paper believes that it is possible 
to produce a competitive version of this algorithm by setting the suboptimal penalty 
to zero at the beginning of the algorithm and iteratively increasing it depending on the 
effect on the cost of the output. 

9 Appendix 

The results found this paper are based on the simulation experiments. It will be 
interesting to measure the performance of these online algorithms for real-world I/O 
demands. And, how it scales with the size of input. 

Work on solving other versions of this problem where the distances dijS of edges may 
change over time is currently in progress. 

10 Glossary 

Definition 1. Dynamic graphs: Graphs that have a finite set of nodes and edges but 
where, the edges or nodes may become unavailable and available over time. 

Definition 2. Derandomization: Process of removing randomness from an algorithm 
to make it more deterministic. 

Definition 3. Assignment problem: Given a bipartite graph with tasks and resource 
on either side and a cost associated with allocating a task to a resource, the assignment 
problem is used to find the optimal allocation of the resources to the tasks (to minimize 
a Objective function) 

Definition 4. Linear programming: A method used to solve large scale optimization 
problems with set constraints and objective function (minimize or maximize a quantity) 

Definition 5. Combinatorial optimization: A certain class of optimization problems 
that involves minimization of a Objective function where there are multiple choices 
available at each step. 
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